
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" lang="zh_CN">
  <head>
    <meta charset="utf-8" />
    <title>urllib.robotparser --- robots.txt 语法分析程序 &#8212; Python 3.7.8 文档</title>
    <link rel="stylesheet" href="../_static/pydoctheme.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    
    <script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <script type="text/javascript" src="../_static/language_data.js"></script>
    <script type="text/javascript" src="../_static/translations.js"></script>
    
    <script type="text/javascript" src="../_static/sidebar.js"></script>
    
    <link rel="search" type="application/opensearchdescription+xml"
          title="在 Python 3.7.8 文档 中搜索"
          href="../_static/opensearch.xml"/>
    <link rel="author" title="关于这些文档" href="../about.html" />
    <link rel="index" title="索引" href="../genindex.html" />
    <link rel="search" title="搜索" href="../search.html" />
    <link rel="copyright" title="版权所有" href="../copyright.html" />
    <link rel="next" title="http --- HTTP 模块" href="http.html" />
    <link rel="prev" title="urllib.error --- urllib.request 引发的异常类" href="urllib.error.html" />
    <link rel="shortcut icon" type="image/png" href="../_static/py.png" />
    <link rel="canonical" href="https://docs.python.org/3/library/urllib.robotparser.html" />
    
    <script type="text/javascript" src="../_static/copybutton.js"></script>
    
    
    
    
    <style>
      @media only screen {
        table.full-width-table {
            width: 100%;
        }
      }
    </style>
 

  </head><body>
  
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>导航</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="总目录"
             accesskey="I">索引</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python 模块索引"
             >模块</a> |</li>
        <li class="right" >
          <a href="http.html" title="http --- HTTP 模块"
             accesskey="N">下一页</a> |</li>
        <li class="right" >
          <a href="urllib.error.html" title="urllib.error --- urllib.request 引发的异常类"
             accesskey="P">上一页</a> |</li>
        <li><img src="../_static/py.png" alt=""
                 style="vertical-align: middle; margin-top: -1px"/></li>
        <li><a href="https://www.python.org/">Python</a> &#187;</li>
        <li>
          <a href="../index.html">3.7.8 Documentation</a> &#187;
        </li>

          <li class="nav-item nav-item-1"><a href="index.html" >Python 标准库</a> &#187;</li>
          <li class="nav-item nav-item-2"><a href="internet.html" accesskey="U">互联网协议和支持</a> &#187;</li>
    <li class="right">
        

    <div class="inline-search" style="display: none" role="search">
        <form class="inline-search" action="../search.html" method="get">
          <input placeholder="快速搜索" type="text" name="q" />
          <input type="submit" value="转向" />
          <input type="hidden" name="check_keywords" value="yes" />
          <input type="hidden" name="area" value="default" />
        </form>
    </div>
    <script type="text/javascript">$('.inline-search').show(0);</script>
         |
    </li>

      </ul>
    </div>    

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <div class="section" id="module-urllib.robotparser">
<span id="urllib-robotparser-parser-for-robots-txt"></span><h1><a class="reference internal" href="#module-urllib.robotparser" title="urllib.robotparser: Load a robots.txt file and answer questions about fetchability of other URLs."><code class="xref py py-mod docutils literal notranslate"><span class="pre">urllib.robotparser</span></code></a> --- robots.txt 语法分析程序<a class="headerlink" href="#module-urllib.robotparser" title="永久链接至标题">¶</a></h1>
<p><strong>源代码：</strong> <a class="reference external" href="https://github.com/python/cpython/tree/3.7/Lib/urllib/robotparser.py">Lib/urllib/robotparser.py</a></p>
<hr class="docutils" id="index-0" />
<p>此模块提供了一个单独的类 <a class="reference internal" href="#urllib.robotparser.RobotFileParser" title="urllib.robotparser.RobotFileParser"><code class="xref py py-class docutils literal notranslate"><span class="pre">RobotFileParser</span></code></a>，它可以回答关于某个特定用户代理是否能在 Web 站点获取发布 <code class="file docutils literal notranslate"><span class="pre">robots.txt</span></code> 文件的 URL 的问题。 有关 <code class="file docutils literal notranslate"><span class="pre">robots.txt</span></code> 文件结构的更多细节请参阅 <a class="reference external" href="http://www.robotstxt.org/orig.html">http://www.robotstxt.org/orig.html</a>。</p>
<dl class="class">
<dt id="urllib.robotparser.RobotFileParser">
<em class="property">class </em><code class="sig-prename descclassname">urllib.robotparser.</code><code class="sig-name descname">RobotFileParser</code><span class="sig-paren">(</span><em class="sig-param">url=''</em><span class="sig-paren">)</span><a class="headerlink" href="#urllib.robotparser.RobotFileParser" title="永久链接至目标">¶</a></dt>
<dd><p>这个类提供了一些可以读取、解析和回答关于 <em>url</em> 上的 <code class="file docutils literal notranslate"><span class="pre">robots.txt</span></code> 文件的问题的方法。</p>
<dl class="method">
<dt id="urllib.robotparser.RobotFileParser.set_url">
<code class="sig-name descname">set_url</code><span class="sig-paren">(</span><em class="sig-param">url</em><span class="sig-paren">)</span><a class="headerlink" href="#urllib.robotparser.RobotFileParser.set_url" title="永久链接至目标">¶</a></dt>
<dd><p>设置指向 <code class="file docutils literal notranslate"><span class="pre">robots.txt</span></code> 文件的 URL。</p>
</dd></dl>

<dl class="method">
<dt id="urllib.robotparser.RobotFileParser.read">
<code class="sig-name descname">read</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#urllib.robotparser.RobotFileParser.read" title="永久链接至目标">¶</a></dt>
<dd><p>读取 <code class="file docutils literal notranslate"><span class="pre">robots.txt</span></code> URL 并将其输入解析器。</p>
</dd></dl>

<dl class="method">
<dt id="urllib.robotparser.RobotFileParser.parse">
<code class="sig-name descname">parse</code><span class="sig-paren">(</span><em class="sig-param">lines</em><span class="sig-paren">)</span><a class="headerlink" href="#urllib.robotparser.RobotFileParser.parse" title="永久链接至目标">¶</a></dt>
<dd><p>解析行参数。</p>
</dd></dl>

<dl class="method">
<dt id="urllib.robotparser.RobotFileParser.can_fetch">
<code class="sig-name descname">can_fetch</code><span class="sig-paren">(</span><em class="sig-param">useragent</em>, <em class="sig-param">url</em><span class="sig-paren">)</span><a class="headerlink" href="#urllib.robotparser.RobotFileParser.can_fetch" title="永久链接至目标">¶</a></dt>
<dd><p>如果允许 <em>useragent</em> 按照被解析 <code class="file docutils literal notranslate"><span class="pre">robots.txt</span></code> 文件中的规则来获取 <em>url</em> 则返回 <code class="docutils literal notranslate"><span class="pre">True</span></code>。</p>
</dd></dl>

<dl class="method">
<dt id="urllib.robotparser.RobotFileParser.mtime">
<code class="sig-name descname">mtime</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#urllib.robotparser.RobotFileParser.mtime" title="永久链接至目标">¶</a></dt>
<dd><p>返回最近一次获取 <code class="docutils literal notranslate"><span class="pre">robots.txt</span></code> 文件的时间。 这适用于需要定期检查 <code class="docutils literal notranslate"><span class="pre">robots.txt</span></code> 文件更新情况的长时间运行的网页爬虫。</p>
</dd></dl>

<dl class="method">
<dt id="urllib.robotparser.RobotFileParser.modified">
<code class="sig-name descname">modified</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#urllib.robotparser.RobotFileParser.modified" title="永久链接至目标">¶</a></dt>
<dd><p>将最近一次获取 <code class="docutils literal notranslate"><span class="pre">robots.txt</span></code> 文件的时间设置为当前时间。</p>
</dd></dl>

<dl class="method">
<dt id="urllib.robotparser.RobotFileParser.crawl_delay">
<code class="sig-name descname">crawl_delay</code><span class="sig-paren">(</span><em class="sig-param">useragent</em><span class="sig-paren">)</span><a class="headerlink" href="#urllib.robotparser.RobotFileParser.crawl_delay" title="永久链接至目标">¶</a></dt>
<dd><p>为指定的 <em>useragent</em> 从 <code class="docutils literal notranslate"><span class="pre">robots.txt</span></code> 返回 <code class="docutils literal notranslate"><span class="pre">Crawl-delay</span></code> 形参。 如果此形参不存在或不适用于指定的 <em>useragent</em> 或者此形参的 <code class="docutils literal notranslate"><span class="pre">robots.txt</span></code> 条目存在语法错误，则返回 <code class="docutils literal notranslate"><span class="pre">None</span></code>。</p>
<div class="versionadded">
<p><span class="versionmodified added">3.6 新版功能.</span></p>
</div>
</dd></dl>

<dl class="method">
<dt id="urllib.robotparser.RobotFileParser.request_rate">
<code class="sig-name descname">request_rate</code><span class="sig-paren">(</span><em class="sig-param">useragent</em><span class="sig-paren">)</span><a class="headerlink" href="#urllib.robotparser.RobotFileParser.request_rate" title="永久链接至目标">¶</a></dt>
<dd><p>以 <a class="reference internal" href="../glossary.html#term-named-tuple"><span class="xref std std-term">named tuple</span></a> <code class="docutils literal notranslate"><span class="pre">RequestRate(requests,</span> <span class="pre">seconds)</span></code> 的形式从 <code class="docutils literal notranslate"><span class="pre">robots.txt</span></code> 返回 <code class="docutils literal notranslate"><span class="pre">Request-rate</span></code> 形参的内容。 如果此形参不存在或不适用于指定的 <em>useragent</em> 或者此形参的 <code class="docutils literal notranslate"><span class="pre">robots.txt</span></code> 条目存在语法错误，则返回 <code class="docutils literal notranslate"><span class="pre">None</span></code>。</p>
<div class="versionadded">
<p><span class="versionmodified added">3.6 新版功能.</span></p>
</div>
</dd></dl>

</dd></dl>

<p>下面的例子演示了 <a class="reference internal" href="#urllib.robotparser.RobotFileParser" title="urllib.robotparser.RobotFileParser"><code class="xref py py-class docutils literal notranslate"><span class="pre">RobotFileParser</span></code></a> 类的基本用法:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">urllib.robotparser</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">rp</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">robotparser</span><span class="o">.</span><span class="n">RobotFileParser</span><span class="p">()</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">rp</span><span class="o">.</span><span class="n">set_url</span><span class="p">(</span><span class="s2">&quot;http://www.musi-cal.com/robots.txt&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">rp</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">rrate</span> <span class="o">=</span> <span class="n">rp</span><span class="o">.</span><span class="n">request_rate</span><span class="p">(</span><span class="s2">&quot;*&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">rrate</span><span class="o">.</span><span class="n">requests</span>
<span class="go">3</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">rrate</span><span class="o">.</span><span class="n">seconds</span>
<span class="go">20</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">rp</span><span class="o">.</span><span class="n">crawl_delay</span><span class="p">(</span><span class="s2">&quot;*&quot;</span><span class="p">)</span>
<span class="go">6</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">rp</span><span class="o">.</span><span class="n">can_fetch</span><span class="p">(</span><span class="s2">&quot;*&quot;</span><span class="p">,</span> <span class="s2">&quot;http://www.musi-cal.com/cgi-bin/search?city=San+Francisco&quot;</span><span class="p">)</span>
<span class="go">False</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">rp</span><span class="o">.</span><span class="n">can_fetch</span><span class="p">(</span><span class="s2">&quot;*&quot;</span><span class="p">,</span> <span class="s2">&quot;http://www.musi-cal.com/&quot;</span><span class="p">)</span>
<span class="go">True</span>
</pre></div>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
  <h4>上一个主题</h4>
  <p class="topless"><a href="urllib.error.html"
                        title="上一章"><code class="xref py py-mod docutils literal notranslate"><span class="pre">urllib.error</span></code> --- urllib.request 引发的异常类</a></p>
  <h4>下一个主题</h4>
  <p class="topless"><a href="http.html"
                        title="下一章"><code class="xref py py-mod docutils literal notranslate"><span class="pre">http</span></code> --- HTTP 模块</a></p>
  <div role="note" aria-label="source link">
    <h3>本页</h3>
    <ul class="this-page-menu">
      <li><a href="../bugs.html">提交 Bug</a></li>
      <li>
        <a href="https://github.com/python/cpython/blob/3.7/Doc/library/urllib.robotparser.rst"
            rel="nofollow">显示源代码
        </a>
      </li>
    </ul>
  </div>
        </div>
      </div>
      <div class="clearer"></div>
    </div>  
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>导航</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="总目录"
             >索引</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python 模块索引"
             >模块</a> |</li>
        <li class="right" >
          <a href="http.html" title="http --- HTTP 模块"
             >下一页</a> |</li>
        <li class="right" >
          <a href="urllib.error.html" title="urllib.error --- urllib.request 引发的异常类"
             >上一页</a> |</li>
        <li><img src="../_static/py.png" alt=""
                 style="vertical-align: middle; margin-top: -1px"/></li>
        <li><a href="https://www.python.org/">Python</a> &#187;</li>
        <li>
          <a href="../index.html">3.7.8 Documentation</a> &#187;
        </li>

          <li class="nav-item nav-item-1"><a href="index.html" >Python 标准库</a> &#187;</li>
          <li class="nav-item nav-item-2"><a href="internet.html" >互联网协议和支持</a> &#187;</li>
    <li class="right">
        

    <div class="inline-search" style="display: none" role="search">
        <form class="inline-search" action="../search.html" method="get">
          <input placeholder="快速搜索" type="text" name="q" />
          <input type="submit" value="转向" />
          <input type="hidden" name="check_keywords" value="yes" />
          <input type="hidden" name="area" value="default" />
        </form>
    </div>
    <script type="text/javascript">$('.inline-search').show(0);</script>
         |
    </li>

      </ul>
    </div>  
    <div class="footer">
    &copy; <a href="../copyright.html">版权所有</a> 2001-2020, Python Software Foundation.
    <br />
    Python 软件基金会是一个非盈利组织。
    <a href="https://www.python.org/psf/donations/">请捐助。</a>
    <br />
    最后更新于 6月 29, 2020.
    <a href="../bugs.html">发现了问题</a>？
    <br />
    使用<a href="http://sphinx.pocoo.org/">Sphinx</a>2.3.1 创建。
    </div>

  </body>
</html>